L2 Regularization for Learning Kernels
نویسندگان
چکیده
The choice of the kernel is critical to the success of many learning algorithms but it is typically left to the user. Instead, the training data can be used to learn the kernel by selecting it out of a given family, such as that of non-negative linear combinations of p base kernels, constrained by a trace or L1 regularization. This paper studies the problem of learning kernels with the same family of kernels but with an L2 regularization instead, and for regression problems. We analyze the problem of learning kernels with ridge regression. We derive the form of the solution of the optimization problem and give an efficient iterative algorithm for computing that solution. We present a novel theoretical analysis of the problem based on stability and give learning bounds for orthogonal kernels that contain only an additive term O( √ p/m) when compared to the standard kernel ridge regression stability bound. We also report the results of experiments indicating that L1 regularization can lead to modest improvements for a small number of kernels, but to performance degradations in larger-scale cases. In contrast, L2 regularization never degrades performance and in fact achieves significant improvements with a large number of kernels.
منابع مشابه
New Generalization Bounds for Learning Kernels
This paper presents several novel generalization bounds for the problem of learning kernels based on the analysis of the Rademacher complexity of the corresponding hypothesis sets. Our bound for learning kernels with a convex combination of p base kernels has only a log p dependency on the number of kernels, p, which is considerably more favorable than the previous best bound given for the same...
متن کاملGeneralization Bounds for Learning Kernels
This paper presents several novel generalization bounds for the problem of learning kernels based on a combinatorial analysis of the Rademacher complexity of the corresponding hypothesis sets. Our bound for learning kernels with a convex combination of p base kernels using L1 regularization admits only a √ log p dependency on the number of kernels, which is tight and considerably more favorable...
متن کاملRegularization Strategies and Empirical Bayesian Learning for MKL
Multiple kernel learning (MKL), structured sparsity, and multi-task learning have recently received considerable attention. In this paper, we show how different MKL algorithms can be understood as applications of either regularization on the kernel weights or block-norm-based regularization, which is more common in structured sparsity and multi-task learning. We show that these two regularizati...
متن کاملSparsity in Multiple Kernel Learning
The problem of multiple kernel learning based on penalized empirical risk minimization is discussed. The complexity penalty is determined jointly by the empirical L2 norms and the reproducing kernel Hilbert space (RKHS) norms induced by the kernels with a data-driven choice of regularization parameters. The main focus is on the case when the total number of kernels is large, but only a relative...
متن کاملFast Learning Rate of Multiple Kernel Learning: Trade-Off between Sparsity and Smoothness
We investigate the learning rate of multiple kernel leaning (MKL) with l1 and elastic-net regularizations. The elastic-net regularization is a composition of an l1-regularizer for inducing the sparsity and an l2-regularizer for controlling the smoothness. We focus on a sparse setting where the total number of kernels is large but the number of non-zero components of the ground truth is relative...
متن کامل